digital_audiofandomcom-20200213-history
Psychoacoustics
The science of psychoacoustics has to do with human beings' perception of sound. Once sound has passed through the ear and the mechanical processes we can see there, and entered the brain as nerve signals, the science begins to get cloudy. We collectively know very little about how the brain interprets these signals. What we can study, however, is the effects of the way the hearing process works. For example, we can easily study our own hearing frequency range by listening to a frequency sweep from 0Hz upwards, and marking where the sound becomes inaudible. As another example, Fletcher & Munson performed studies on human perception of loudness at different frequencies, as discussed below, and a lot of other things are still to be learned about how we hear the world around us. This page also discusses the rationale for sample rate, which is a factor involved in PCM digital encoding. Human Hearing and Directionality One of the most important factors in human hearing is directionality. We process two pieces of information to get a useful impression of distance and direction to a sound source. This is very similar to binocular vision - I.E. Two receivers allow us to establish that directionality. The two pieces of data used to establish directionality are: *Inter-aural time difference, ITD (the time lag between incident sound arriving at the ear closest to the source, and arrival at the other ear) *Inter-aural phase difference, ILD (the difference in the sound pressure level or perceived loudness of a sound between the ears) The time difference factor is easy to understand on the surface. If a sound comes from the right, it hits our right ear before our left - which may in fact only receive reflected sound from a nearby wall depending on the severity of the angle. This time lag gives us all sorts of information, about the sound itself and its relative location, as well as information about the room around us. In many senses we can "hear" the size of wherever we are, even with very little noise. ITD is also a product of phase difference, which is another cue we use to identify a sound's location. Phase relates to the time element of a wave as it oscillates. Phase is measured in geometric degrees, meaning that a wave runs through 360 degrees in a single cycle. This is illustrated in Video 1. As we are dealing with sound in air, the phase of a sound refers in literal terms to whether your ears are receiving a pocket of high or low air pressure as part of the sound pressure wave. The combination of the difference in time and the difference in pressure or phase between the ears allows the brain to form a directional image of the sound source. By manipulating these rules about our hearing, we can perform some quite impressive feats in recording and sound reproduction. Video 2 is an example of a binaural recording. If listened to on headphones, the reproduction of the recorded environment should be strikingly realistic in terms of perceiving the source location relative to the listening position. The term binaural here refers to the use of two microphones mounted on a dummy head, designed to create a realistic impression of being present in the room when the sound occurs. The features of the head-shadowing effect (Fig. 1) mean that the human-like head is having an effect on ITD and ILD just like our own head does in reality. This simulated effect is quite convincing when done correctly, and Video 2 provides good examples of this in action. Human Hearing and Frequency Human perception of frequency takes place in the cochlea. This is a small, snail-shell shaped bony structure inside the ear which contains a large number of small hairs (known as stereocilia) which sit at the terminals of nerve chains leading to the brain, and is filled with a fluid. When a pressure wave moves through the fluid, transmitted to the cochlea from the eardrum, it stimulates these small hair-cells which in turn create nerve impulses which travel to the brain. As seen in Fig. 2, the stereocilia are arranged in such a way that they gradually get longer and more limber as the sound travels along to the apex of the structure, in the centre. All of these hairs vibrate at a particular frequency, and they degenerate over time which is the cause of hearing loss due to old age as well as hearing damage such as tinnitus (which, it is hypothesised, causes a particular hair cell either to vibrate itself indefinitely, or indicates a damaged nerve creating a perceived false tone). The construction of the cochlea has some peculiarities which lead to variations in our ability to perceive sound within different frequency ranges. Two American acousticians named Harvey Fletcher and Wilden Munson published a study in the 1930s which examined this phenomenon. What they produced is shown in Fig. 3, and is known as the Fletcher-Munson equal loudness curve. This curve is a very useful piece of information for audio engineers. For example, it allows a mix engineer to properly account for the difference in perceived level within different frequency ranges - knowing how some bands may not require as much accentuation or gain as others. The curve(s) themselves represent the relative level at which a set of participants rated the volume level of a sound played to them at a given frequency. The actual level (measured in dB SPL or decibels of Sound Pressure Level) is shown on the Y axis, with frequency along X. The perceived level is measured in Phons. In a sense, the curves can be used to estimate the kind of accentuation or attentuation these different bands would require in order for us to hear a sweep through all audible frequencies to remain at exactly the same volume throughout. This data is an interesting insight into the way that our anatomy affects our interaction with sound. Perceptual Codecs 'Codec' is a combination of the terms 'Encoder' and 'Decoder'. A codec is defined as: Codecs, specifically lossy codecs, rely on the principles of psychoacoustics to allow some data to be lost (truncated) while keeping the drop in quality as minimal as possible. Codecs which take advantage of these loopholes in human psychoacoustics are known as perceptual codecs. Some lossy codecs which manipulate human psychoacoustics are known as perceptual codecs. Some examples of perceptual codecs are .mp3 and .aac. These codecs typically focus on data which is present within an 'important' frequency range. For adults, the boundaries of this range are generally considered to be 20Hz up to anywhere between 15 and 18kHz, depending on listening history. This article at Stereophile.com illustrates some of the effects different codecs apply to the audio they are applied to. MP3 vs AAC vs FLAC vs CD - Stereophile.com :noun :1. a device or program that compresses data to enable faster transmission and decompresses received data. : Sample Rate (Nyquist Theorem) The usage of a baseline sample rate of 44,100Hz for CD audio is based on Nyquist Theorem. The theory states that 44,100Hz is the minimum sample rate required for sound reproduction without artifacts. This is because the maximum human hearing range is known to be 20Hz-20kHz. Because a wave travels through positive and negative domains (of pressure in air, or of voltage in an analogue system), the sampling rate must be double the maximum frequency which is to be captured. This gives 40,000Hz as a starting point. Additionally, incoming signals must be filtered to remove extraneous sub-bass (below 20Hz) and supersonic (above 20kHz) frequencies before sampling. Due to the fact that no perfect filter exists which can instantaneously remove all sound beyond a given point, a margin of error has to be accounted for, hence 44.1kHz. Psychoacoustics Summary *Humans perceive a frequency range between 20Hz and 20kHz, this is due to the structure of the cochlea *We identify the location of a sound source by using cues from Inter-Aural Time Difference and Inter-Aural Level Difference, of which binaural recording techniques are a good demonstration *Fletcher and Munson created a curve which represents the variations in our perception of volume level at different frequencies *This information, combined with clinical knowledge on natural hearing loss, is used by certain types of digital audio formats which use perceptual codecs to apply compression without audibly limiting the frequency range of the programme material